Use this template to complete your project throughout the course. Your Final Project presentation in class will be based on the contents of this document. Replace the title/name and text below with your own, but leave the headers.

Overview

In this section, give a brief a description of your project and its goal, what data you are using to complete it, and what three faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository.

Advances in 3D ultrasound imaging technology have revealed associations between early placental volume and growth outcomes. However, a more detailed evaluation of placental shape remains elusive and could potentially lead to new quantitative metrics for fetal growth and outcome prediction. The goal of this project is to determine if 3D ultrasound-derived morphological characteristics of the placenta in the first trimester of pregnancy can predict whether a baby will be characterized as “small for gestational age” (within the 10th percentile for fetal birth weight).

Morphological measurements of the placenta extracted from first-trimester 3D ultrasound will be used in a model that evaluates associations between these measurements, maternal characteristics, and fetal outcome (i.e., whether a baby is “small for gestational age”, <= 10th percentile of fetal birth weight). The anonymized dataset is available from the Department of Maternal Fetal Medicine at Penn and includes first-trimester placenta measurements and patient characteristics of 600 subjects. A subset of this data (62 subjects) contains a larger number of 3D ultrasound characteristics. The model will be evaluated on both the entire dataset and subset of 62 patients.

Introduction

In the first paragraph, describe the problem addressed, its significance, and some background to motivate the problem.

In the second paragraph, explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff.

Methods

In the first paragraph, describe the data used and general methodological approach. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why.

First, the read the data.

Variable Name Description
model_id model ID number
study_id study ID number
race race (7,44,1,2,3)
wtscrn
height maternal height (in)
sbpscrn
crl crown rump length (mm)
pappamom
gadel
fetal_sex fetal sex (0 = ?, 1 = ?)
birthwt fetal birthweight (g)
maternal_age_US1 maternal age at first ultrasound exam (years)
gest_age_US1 gestational age at first ultrasound exam (days)
sga_5th
sga_10th
library(readxl)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## âś” ggplot2 3.0.0     âś” purrr   0.2.5
## âś” tibble  1.4.2     âś” dplyr   0.7.6
## âś” tidyr   0.8.1     âś” stringr 1.3.1
## âś” readr   1.1.1     âś” forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
setwd("/Users/alison/BMIN503_placenta")

# Read in the entire clinical sheet
df.clinical=read_xlsx("/Users/alison/Desktop/BMIN_503/final_project/placenta_subject_data.xlsx")

# Select variables wanted
df.clinical.vars <- df.clinical %>%
    select(model_id = "Model number", 
           study_id = "Study ID", 
           race = "RACE", 
           wtscrn = "WTSCRN (kg)", 
           height = "HT (in)", 
           sbpscrn = "SBPSCRN", 
           crl = "CRL (mm)", 
           pappamom = "PAPPAMOM", 
           gadel = "GADEL", 
           fetal_sex = "FETSEX1", 
           birthwt = "BIRTHWT (g)", 
           maternal_age_US1 = "Maternal Age at US1", 
           gest_age_US1 = "GA US1",
           sga_5th = "SGA<5th%",
           sga_10th = "SGA<10th%") 

# Convert some columns to numeric
variables.numeric <- c("race","wtscrn","height","sbpscrn","crl","pappamom","gadel","fetal_sex","birthwt","maternal_age_US1","gest_age_US1","sga_5th","sga_10th")
df.clinical.vars[,names(df.clinical.vars) %in% variables.numeric] <- sapply(lapply(df.clinical.vars[,names(df.clinical.vars) %in% variables.numeric],as.character),as.numeric)

# Make some variables factors
df.clinical.vars <- df.clinical.vars %>%
    mutate(race = factor(race, levels=c(1,2,3), labels=c("white","black","asian"))) %>%
    mutate(fetal_sex = factor(fetal_sex, levels=c(0,1), labels=c("male","female"))) %>%
    mutate(sga_5th = factor(sga_5th, levels=c(0,1), labels=c("no","yes"))) %>%
    mutate(sga_10th = factor(sga_10th, levels=c(0,1), labels=c("no","yes"))) 

# Read VOCAL measurements (AIUM data)
df.vocal=read_xlsx("/Users/alison/Desktop/BMIN_503/final_project/placenta_vocal_measures.xlsx") 

df.vocal <- df.vocal %>%
  mutate(study_id=as.numeric(`Study ID`)) %>%
  rename(Vvocal=VolumeA,Tvocal=ThicknessA,CRLvocal=`CRL (mm)`) %>%
  select(study_id,Vvocal,Tvocal,CRLvocal) 

df.clinical.all <- inner_join(df.clinical.vars,df.vocal,by="study_id") %>%
  filter(model_id %in% seq(1,60,1))

# Read 3DUS measurements
df.measures.3d=read.csv("/Users/alison/Desktop/BMIN_503/final_project/sga_study_allsubjects.csv") %>%
  rename(model_id=Model) %>%
  mutate(Vsnap=Vmanual/1000) %>%
  mutate(Tcmrep_mean=thickness_mean/10) %>%
  mutate(Tcmrep_max=thickness_max/10)

df.merge = inner_join(df.clinical.all,df.measures.3d,by="model_id")

Here we perform the exploratory data analysis.

library(GGally)
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
library(ggplot2)
library(cowplot)
## 
## 
## *******************************************************
## Note: cowplot does not change the default ggplot2 theme
## anymore. To recover the previous behavior, execute:
##   theme_set(theme_cowplot())
## *******************************************************
library(ggthemes)
## 
## Attaching package: 'ggthemes'
## The following object is masked from 'package:cowplot':
## 
##     theme_map
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# Maternal characteristics relative to sga_10th
g1 <- ggplot(data=df.merge, aes(x=race, fill=sga_10th)) +
        geom_bar(position="stack")
g2 <- ggplot(data=df.merge, aes(x=sga_10th,y=sbpscrn)) +
        geom_boxplot()
g3 <- ggplot(data=df.merge, aes(x=sga_10th,y=height)) +
        geom_boxplot()
g4 <- ggplot(data=df.merge, aes(x=sga_10th,y=wtscrn)) +
        geom_boxplot()

plot_grid(g1, g2, g3, g4, ncol = 2, labels="AUTO")

# fetal characteristics relative to sga_10th
f1 <- ggplot(data=df.merge, aes(x=race, fill=sga_10th)) +
        geom_bar(position="stack")
f2 <- ggplot(data=df.merge, aes(x=fetal_sex.x, fill=sga_10th)) +
        geom_bar(position="stack")
f3 <- ggplot(data=df.merge, aes(x=sga_10th,y=maternal_age_US1)) +
        geom_boxplot()
f4 <- ggplot(data=df.merge, aes(x=sga_10th,y=height)) +
        geom_boxplot()
f5 <- ggplot(data=df.merge, aes(x=sga_10th,y=wtscrn)) +
        geom_boxplot()

plot_grid(f1, f2, f3, f4, f5, ncol = 2, labels="AUTO")

#g6 <- ggplot(data=df.merge, aes(x=sga_10th,y=gadel)) +
#        geom_boxplot()

# Volume analysis 
g <- ggplot(data=df.merge, aes(x=Vvocal,y=Vsnap)) +
    geom_smooth(method = "lm", color="black",size=0.1) + 
    geom_point(aes(shape=sga_10th,text=paste("Model ID:",model_id,"Study ID:",study_id)),color="black",size=2) +
    scale_shape_manual(values=c(1,3)) 
ggplotly(g)
glm.vol = glm(Vvocal~Vsnap,data=df.merge)
summary(glm.vol)
## 
## Call:
## glm(formula = Vvocal ~ Vsnap, data = df.merge)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -50.834  -10.500   -0.016    6.110   93.790  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.9789     7.5478   1.720   0.0908 .  
## Vsnap         0.6628     0.0907   7.308 8.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 431.7668)
## 
##     Null deviance: 48101  on 59  degrees of freedom
## Residual deviance: 25042  on 58  degrees of freedom
## AIC: 538.31
## 
## Number of Fisher Scoring iterations: 2
# Thickness analysis
tmean <- ggplot(data=df.merge, aes(x=Tvocal,y=Tcmrep_mean)) +
         geom_smooth(method = "lm", color="black",size=0.1) + 
         geom_point(aes(shape=sga_10th,text=paste("Model ID:",model_id)),color="black",size=2) +
         scale_shape_manual(values=c(1,3)) 
ggplotly(tmean)
glm.thickness_mean = glm(Tvocal~Tcmrep_mean,data=df.merge)
summary(glm.thickness_mean)
## 
## Call:
## glm(formula = Tvocal ~ Tcmrep_mean, data = df.merge)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.86258  -0.21914  -0.07454   0.20751   0.83857  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.2024     0.1645   7.308  8.9e-10 ***
## Tcmrep_mean   0.4136     0.1207   3.427  0.00113 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1193708)
## 
##     Null deviance: 8.3253  on 59  degrees of freedom
## Residual deviance: 6.9235  on 58  degrees of freedom
## AIC: 46.707
## 
## Number of Fisher Scoring iterations: 2
tmax <- ggplot(data=df.merge, aes(x=Tvocal,y=Tcmrep_max)) +
         geom_smooth(method = "lm", color="black",size=0.1) + 
         geom_point(aes(shape=sga_10th,text=paste("Model ID:",model_id)),color="black",size=2) +
         scale_shape_manual(values=c(1,3)) 
ggplotly(tmax)
glm.thickness_max = glm(Tvocal~Tcmrep_max,data=df.merge)
summary(glm.thickness_max)
## 
## Call:
## glm(formula = Tvocal ~ Tcmrep_max, data = df.merge)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.83248  -0.20510  -0.05782   0.15915   0.83063  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.22858    0.15902   7.726 1.76e-10 ***
## Tcmrep_max   0.23178    0.06847   3.385  0.00128 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.119859)
## 
##     Null deviance: 8.3253  on 59  degrees of freedom
## Residual deviance: 6.9518  on 58  degrees of freedom
## AIC: 46.952
## 
## Number of Fisher Scoring iterations: 2

Results

Describe your results and include relevant tables, plots, and code/comments used to obtain them. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.